Tiling of Iteration Spaces for Multicomputers

نویسندگان

  • J. Ramanujam
  • P. Sadayappan
چکیده

We deal with compiler support for parallelizing perfectly nested loops for coarse-grain distributed memory machines. The relatively high communication start-up costs in these machines renders frequent communication very expensive. We study the eeect of clustering communication and the ensuing loss of parallelism on performance and propose a method for aggregating a number of loop iterations into \tiles" where the tiles execute atomically where there are no synchronizations to be performed during the execution of a tile. As a result, it is important that dividing the loops into tiles does not lead to deadlock. Based on conditions for deadlock-free tiles, we present a method for deriving legal tiles for nested loops. We then develop an approach to optimize the shape and size of tiles along with the assignment of tiles to processors for load-balanced execution with reduced communication costs on distributed memory machines given communication setup and transfer rates and instruction execution rates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tiling Multidimensional Iteration Spaces for Multicomputers

This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically – a processor executing the iteration...

متن کامل

Tiling Multidimensional Itertion Spaces for Multicomputers

This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically – a processor executing the iteration...

متن کامل

Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency tra c on shared-memory multiprocessors. While several previous papers have looked at hyperplane partitioning of iteration spaces to reduce communication tra c, the problem of deriving the optimal tiling parameters for minimal communication in loops with general a ne index expres...

متن کامل

Reducing Data Communication Overhead for Doacross Loop Nests Reducing Data Communication Overhead for Doacross Loop Nests

If the loop iterations of a loop nest cannot be partitioned into independent sets, the data communication for data dependences are inevitable in order to execute them on parallel machines. This kind of loop nests are referred to as Doacross loop nests. This paper is concerned with compiler algorithms for parallelizing Doacross loop nests for distributed-memory multicomputers. We present a metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990